Outcome: Data Lake
Purpose
To store all the data that can be used by the Big Data ecosystem.
Relationships
Main Description

A data lake is a storage repository that holds a huge amount of raw data as it was generated, while it is still not necessary to process it. In general, data lakes store unstructured data but they can combine different kinds of data. The data lake is part of the Collector component, as it stores the raw data received from the data sources. For that reason, it is important that it is aligned with the defined requirements. Indeed, this data lake can be better managed if we use metadata to try to tackle the problematics of having a huge amount of disorganized data.

References:

  • N. Miloslavskaya and A. Tolstoy, ‘Application of big data, fast data, and data lake concepts to information security issues’, presented at the Proceedings - 2016 4th International Conference on Future Internet of Things and Cloud Workshops, W-FiCloud 2016, 2016, pp. 148–153.
  • C. Diamantini, P. L. Giudice, L. Musarella, D. Potena, E. Storti, and D. Ursino, ‘A new metadata model to uniformly handle heterogeneous data lake sources’, Commun. Comput. Inf. Sci., vol. 909, pp. 165–177, 2018.